162 research outputs found

    Integrating Cultural Knowledge into Artificially Intelligent Systems: Human Experiments and Computational Implementations

    Get PDF
    With the advancement of Artificial Intelligence, it seems as if every aspect of our lives is impacted by AI in one way or the other. As AI is used for everything from driving vehicles to criminal justice, it becomes crucial that it overcome any biases that might hinder its fair application. We are constantly trying to make AI be more like humans. But most AI systems so far fail to address one of the main aspects of humanity: our culture and the differences between cultures. We cannot truly consider AI to have understood human reasoning without understanding culture. So it is important for cultural information to be embedded into AI systems in some way, as well as for the AI systems to understand the differences across these cultures. The main way I have chosen to do this are using two cultural markers: motifs and rituals. This is because they are both so inherently part of any culture. Motifs are things that are repeated often and are grounded in well-known stories, and tend to be very specific to individual cultures. Rituals are something that are part of every culture in some way, and while there are some that are constant across all cultures, some are very specific to individual ones. This makes them great to compare and to contrast. The first two parts of this dissertation talk about a couple of cognitive psychology studies I conducted. The first is to see how people understood motifs. Is is true that in-culture people identify motifs better than out-culture people? We see that my study shows this to indeed be the case. The second study attempts to test if motifs are recognizable in texts, regardless of whether or not people might understand their meaning. Our results confirm our hypothesis that motifs are recognizable. The third part of my work discusses the survey and data collection effort around rituals. I collected data about rituals from people from various national groups, and observed the differences in their responses. The main results from this was twofold: first, that cultural differences across groups are quantifiable, and that they are prevalent and observable with proper effort; and second, to collect and curate a substantial culturally sensitive dataset that can have a wide variety of use across various AI systems. The fourth part of the dissertation focuses on a system I built, called the motif association miner, which provides information about motifs present in input text, like associations, sources of motifs, connotations, etc. This information will be highly useful as this will enable future systems to use my output as input for their systems, and have a better understanding of motifs, especially as this shows an approach of bringing out meaning of motifs specific to certain culture to wider usage. As the final contribution, this thesis details my efforts to use the curated ritual data to improve existing Question Answering system, and show that this method helps systems perform better in situations which vary by culture. This data and approach, which will be made publicly available, will enable others in the field to take advantage of the information contained within to try and combat some bias in their systems

    Compiler-directed Dynamic Linking for Mobile Programs

    Get PDF
    In this paper, we present a compiler-directed technique for safe dynamic linking for mobile programs. Our technique guarantees that linking failures can occur only when a program arrives at a new execution site and that this failure can be delivered to the program as an error code or an exception. We use interprocedural analysis to identify the set of names that must be linked at the different sites the program executes on. We use a combination of runtime and compile-time techniques to identify the calling context and to link only the names needed in that context. Our technique is able to handle recursive programs as well as separately compiled code that may itself be able to move. We discuss language constructs for controlling the behavior of dynamic linking and the implication of some of these constructs for application structure. (Also cross-referenced as UMIACS-TR-96-81

    A Study of Internet Round-Trip Delay

    Get PDF
    We present the results of a study of Internet round-trip delay. The links chosen include links to frequently accessed commercial hosts as well as well-known academic and foreign hosts. Each link was studied for a 48-hour period. We attempt to answer the following questions: (1) how rapidly and in what manner does the delay change -- in this study, we focus on medium-grain (seconds/minutes) and coarse-grain time-scales (tens of minutes/hours); (2) what does the frequency distribution of delay look like and how rapidly does it change; (3) what is a good metric to characterize the delay for the purpose of adaptation. Our conclusions are: (a) there is large temporal and spatial variation in round-trip time (RTT); (b) RTT distribution is usually unimodal and asymmetric and has a long tail on the right hand side; (c) RTT observations in most time periods are tightly clustered around the mode; (d) the mode is a good characteristic value for RTT distributions; (e) RTT distributions change slowly; (f) persistent changes in RTT occur slowly, sharp changes are undone very shortly; (g) jitter in RTT observations is small and (h) inherent RTT occurs frequently. (Also cross-referenced as UMIACS-TR-96-97

    An Interprocedural Framework for Placement of Asychronous I/O Operations

    Get PDF
    Overlapping memory accesses with computations is a standard technique for improving performance on modern architectures, which have deep memory hierarchies. In this paper, we present a compiler technique for overlapping accesses to secondary memory (disks) with computation. We have developed an Interprocedural Balanced Code Placement (IBCP) framework, which performs analysis on arbitrary recursive procedures and arbitrary control flow and replaces synchronous I/O operations with a balanced pair of asynchronous operations. We demonstrate how this analysis is useful for applications which perform frequent and large accesses to secondary memory, including applications which snapshot or checkpoint their computations or out-of-core applications. (Also cross-referenced as UMIACS-TR-95-114

    Study of Scalable Declustering Algorithms for Parallel Grid Files

    Get PDF
    Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called {\em minimax} which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution. (Also cross-referenced as UMIACS-TR-96-4

    Deferred Data-Flow Analysis : Algorithms, Proofs and Applications

    Get PDF
    Loss of precision due to the conservative nature of compile-time dataflow analysis is a general problem and impacts a wide variety of optimizations. We propose a limited form of runtime dataflow analysis, called deferred dataflow analysis (DDFA), which attempts to sharpen dataflow results by using control-flow information that is available at runtime. The overheads of runtime analysis are minimized by performing the bulk of the analysis at compile-time and deferring only a summarized version of the dataflow problem to runtime. Caching and reusing of dataflow results reduces these overheads further. DDFA is an interprocedural framework and can handle arbitrary control structures including multi-way forks, recursion, separately compiled functions and higher-order functions. It is primarily targeted towards optimization of heavy-weight operations such as communication calls, where one can expect significant benefits from sharper dataflow analysis. We outline how DDFA can be used to optimize different kinds of heavy-weight operations such as bulk-prefetching on distributed systems and dynamic linking in mobile programs. We prove that DDFA is safe and that it yields better dataflow information than strictly compile-time dataflow analysis. (Also cross-referenced as UMIACS-TR-98-46

    A Customizable Simulator for Workstation Networks

    Get PDF
    We present a customizable simulator called netsim for high-performance point-to-point workstation networks that is accurate enough to be used for application-level performance analysis yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16-node IBM SP-2 with a multistage network and a 10-node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and a 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm. In addition, we show that the cross-traffic congestion for today's high-speed point-to-point networks has little, if any, effect on application-level performance and that modeling end-point congestion is sufficient for a reasonably accurate simulation. (Also cross-referenced as UMIACS-TR-96-68

    T2: A Customizable Parallel Database For Multi-dimensional Data

    Get PDF
    As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and jointly process multiple datasets with different underlying grids. (Also cross-referenced as UMIACS-TR-98-04
    • …
    corecore